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SUMMARY 

V  - - — 

Thls  BaP°rt  describes  a  series  of  techniques  for  processing  and 
analysing  images.  Although  developed  for  a  specific  purpose,  namely  the 
automatic  reading  of  angular  information  on  Askania  klnetheodolite  film, 
most  of  the  techniques  are  quite  general,  and  potentially  applicable  to  a 
wide  variety  of  problems.  The  processes  described  include  real  time 
binarisation  of  a  television  signal,  production  and  analysis  of  projections 
to  determine  the  positions  of  features  of  interest,  and  character 
recognition. 
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INTRODUCTION 


This  Report  describes  Image  processing  and  analysis  techniques  developed 
as  part  of  a  study  of  the  feasibility  of  automatically  reading  Askanla 
klnetheodolite  film.  The  problem  is  described  in  the  following  paragraphs  and 
the  Report  then  follows  the  sequence  of  processes  required  for  Its  solution. 

The  description  concentrates  on  the  principles  of  the  techniques  as  these  are 
potentially  applicable  to  a  wide  variety  of  problems:  a  great  deal  of  detail 
which  Is  specific  to  the  application  has  been  omitted. 

The  klnetheodolite  is  essentially  a  camera  on  a  tracking  mount.  Fig  1 
shows  a  typical  frame  of  film.  The  angle  at  which  the  instrument  is  pointing  is 
shown,  in  elevation  and  azimuth,  in  the  two  areas  at  the  top  corners  of  the 
frame.  It  is  the  reading  of  the  information  in  these  areas  which  this  report 
describes. 

Fig  2  shows  an  enlarged  picture  of  the  azimuth  scale.  The  large  digits 
give  the  whole  number  of  degrees:  they  can  appear  anywhere  across  the  width  of 
the  picture  in  either  of  two  vertical  positions,  below  the  centre,  as  shown,  or 
above.  Moving  with  the  digits  is  a  cursor  whose  position  is  read  against  the 
scale,  the  length  of  which  is  equivalent  to  0.5°.  If  the  main  digits  are  above 
the  scale,  the  reading  in  in  the  range  x.0°  to  x.5°,  if  below,  the  reading  is  in 
the  range  x.5°  to  x  +  1°.  Interpolation  to  1/10  of  the  scale  graduations  gives 
a  resolution  in  the  reading  of  0.001°.  Thus  the  scale  in  Fig  2  is  showing  a 
reading  of  158.719. 

It  was  required  to  read  this  information  in  less  than  1  second. 

Phjsically,  the  equipment  used  to  do  this  comprises  a  television  camera, 
specially  designed  analogue  and  digital  electronics,  and  a  PDP  LSI-11  mini  com¬ 
puter.  Many  of  the  processes  were  implemented  in  hardware  because  of  the  speed 
requirement,  with  the  computer  being  used  for  final  analysis  and  control.  Fig  3 
shows  a  block  diagram  of  the  major  components  of  the  system.  Solid  lines  indi¬ 
cate  the  flow  of  information,  and  the  dashed  lines  show  control  links  between 
the  computer  and  other  components  of  the  system.  Fig  4  is  a  flow  diagram  of  the 
scale  reading  process:  the  Report  describes  the  techniques  used  in  the  order 
indicated  by  this  Figure. 
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Although  the  picture  contains  many  shades  of  grey,  in  essence  it  is 
two-tone:  black  and  white.  The  first  process  required,  therefore,  is  the  con¬ 
version  of  the  video  signal  produced  by  the  camera  into  a  two-level  signal. 

This  significantly  reduces  the  amount  of  information  involved  in,  and  the 
complexity  of,  the  remainder  of  the  processing.  A  simple  thresholding  method 
would  not  be  adequate  for  the  binariser:  the  reasons  for  this,  and  the  technique 
which  was  developed,  are  described  in  section  2. 

The  two-level  picture  is  produced  in  real  time  with  a  delay  of  2  ps  rela¬ 
tive  to  the  grey  scale  video.  The  binary  signal  is  sampled  to  give  1024  one-bit 
samples  along  each  television  scan  line.  This  data  rate  is  too  high  to  be  read 
directly  by  the  computer,  so  a  special  memory  was  built.  This  video  memory, 
which  is  capable  of  storing  up  to  64  lines  of  the  binarised,  sampled  television 
signal,  is  described  in  section  3.  Techniques  for  'projecting'  part  of  the  pic¬ 
ture  on  to  the  horizontal  or  vertical  axis  have  been  developed.  These  are 
implemented  in  hardware  and  the  results  are  analysed  in  the  computer  to  give 
information  about  the  positions  of  features  in  the  picture,  including  that  of 
the  cursor  relative  to  the  scale  lines.  The  production  and  analysis  of  the  pro¬ 
jections  are  also  described  in  section  3. 

Following  the  analysis  of  the  projections,  the  position  of  each  of  the 
large  digits  is  known.  Before  being  passed  to  the  character  recognition  stage, 
each  digit  is  scaled  down  to  a  standard  size.  Section  4  describes  the  scaling 
process.  Section  5  then  describes  the  weighted  mask  matching  method  used  for 
character  recognition.  In  order  to  optimise  the  mask  weights,  and  also  to 
enable  the  extension  of  the  system  to  other  character  sets,  an  automatic  mask 

generation  technique  was  developed,  and  this  is  outlined  in  the  second  part  of 
section  5. 

2  THE  BINARISER 

The  information  in  the  film  scale  area  is  contained  in  a  pattern  of  white 
lines  on  a  black  background.  Reducing  the  video  signal  to  this  two-level  pat¬ 
tern  has  several  advantages  over  processing  the  full  range  of  grey  levels.  By 
removing  variations  between  scales,  which  carry  no  useful  information,  it  produ¬ 
ces  a  more  standard  input  for  further  processing;  it  reduces  the  size  of  video 
memory  required;  and,  following  from  these  two  factors,  it  reduces  the 
complexity  and  therefore  increases  the  speed,  of  the  subsequent  processing,  par¬ 
ticularly  that  done  by  the  hardware. 
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For  many  reasons,  the  image  on  the  film  is  highly  variable.  Factors  which 
vary  include: 

-  the  density  of  'black'  and  'white' 

differences  between  main  digit  'white'  and  scale  line  'white' 

-  contrast 

shading,  that  is,  the  density  of  'black'  varies  from  one  side  of  the 
picture  to  another,  sometimes  by  an  amount  comparable  to  the  dif¬ 
ference  between  white  and  black  at  any  given  point 

noise,  that  is,  unwanted  items  in  the  picture  such  as  dirt  or,  in  the 
case  of  very  'thin'  film,  the  grain  of  the  film. 

These  properties  are  fairly  constant  along  a  given  film,  so  the  following 
setting  up  procedure,  performed  for  the  first  frame  of  each  film,  is  used  to 
reduce  some  of  the  variations  between  films  befo  ;  the  video  signal  is  passed 
through  the  blnarisec.  First,  the  position  of  a  variable  density  filter,  placed 
in  front  of  the  television  camera,  is  automatically  varied  until  the  'white' 
part  of  the  signal  reaches  a  predetermined  threshold.  The  signal  bias  is  then 
adjusted  so  that  'black*  is  at  a  defined  level.  Finally,  if  the  dynamic  range 
of  the  signal  is  less  than  half,  or  even  quarter,  of  the  maximum  allowed,  then 
the  video  signal  is  electronically  amplified  by  a  factor  of  two  or  four  respec¬ 
tively.  When  setting  up  the  white  and  black  levels,  the  signal  is  only  tested 

over  a  given,  computer  controlled,  area  so  that  the  features  of  interest  can  be 
optimised.  For  example,  the  'white'  surrounding  the  scale  is  much  whiter  than 

the  lines  on  the  scale  and  is  allowed  to  saturate  in  order  to  optimise  the 
setting  up  for  the  scale.  Further,  as  protection  against  small  noise  features, 
the  detected  maxima  and  minima  must  last  for  a  minimum  total  length  of  time 
(accumulated  anywhere  in  the  control  area)  or  they  are  ignored. 

These  processes  reduce  some  of  the  film  variability  problems,  but  noise, 
shading  and  different  levels  of  white  remain.  The  effects  of  these  are  too 
great  to  allow  the  use  of  the  simplest  binarising  technique,  namely  comparing 
the  signal  with  a  fixed  global  threshold.  A  variable  threshold  must  be  used, 
the  value  of  which  depends  on  the  local  nature  of  the  picture.  In  other  words, 
the  picture  must  be  analysed  by  comparing  each  point  with  its  neighbours. 

Several  techniques  for  generating  local  thresholds  are  described  in  Refs  1  and  2. 
Most  of  these  techniques  require  consideration  of  an  area  surrounding  a  given 
point  and  are  therefore  slow.  In  order  to  produce  the  binarised  signal  in  real 
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time,  the  process  described  here  considers  only  the  neighbouring  points  in  the 
same  television  scan  line*  However,  as  most  of  the  features  of  interest  consist 
mainly  of  vertical  lines,  this  restriction  does  not  cause  serious  degradation. 

Fig  5  illustrates  the  basic  principles  of  the  binarising  process.  The 
upper  part  of  the  Figure  shows  a  block  diagram  of  the  stages  of  processing  the 
television  signal,  and  the  lower  part  shows  the  waveforms  produced  at  the 
labelled  points  as  the  camera  scans  across  a  vertical  white  line. 

The  video  signal  is  passed  down  a  tapped  delay  line:  the  signal  to  be 
binarlsed  is  that  which  is  present  at  the  centre  tap  (this  is  the  CENTRE  signal, 
labelled  A  in  the  Figure).  Signals  present  at  the  other  taps  on  the  line  repre¬ 
sent  the  television  picture  during  the  2  ps  preceding,  and  the  2  ps  succeeding, 
the  centre.  The  maximum  and  minimum  values  of  the  signals  at  all  the  taps  are 
detected  continuously  (yielding  waveforms  B  and  C)  and  the  average  of  the  maxi¬ 
mum  and  minimum  is  taken  (D).  The  CENTRE  signal  value  is  compared  with  this 
average.  If  both  black  and  white  signals  are  present  within  the  delay  line,  the 
difference  between  CENTRE  and  average  will  be  substantial,  except  for  the  brief 
periods  of  transition  of  A  from  black  to  white  and  vice  versa,  thus  allowing  a 
reliable  decision  as  to  whether  the  centre  is  black  or  white.  This  is  shown  by 
the  central  portion  of  signal  F  in  Fig  5.  If,  however,  the  signal  in  the  delay 
line  is  all  black  or  all  white,  then  minimum,  maximum,  average  and  centre  are 
approximately  the  same,  and  the  result  of  the  comparison  depends  on  small,  ran¬ 
dom,  picture  variations,  producing  'speckle*  as  at  the  ends  of  signal  F.  This 
speckle  is  suppressed  by  making  use  of  the  knowledge  that  the  picture  comprises 
white  lines  on  a  dark  background,  implying  that  a  substantially  constant  signal 
Is  always  black.  The  detected  minimum,  C,  is  adjusted  so  that  it  slightly 
exceeds  the  peak  of  the  'noise'  signal  (see  below).  This  shifted  signal  is  E  in 
Fig  5.  Whenever  E  is  greater  than  the  detected  maximum  B,  the  binary  video  out¬ 
put  is  set  to  the  black  level.  This  is  done,  as  shown  in  Fig  5,  by  interpreting 
the  result  (waveform  G)  of  comparing  the  maximum  and  shifted  minimum  as  a 
'permit'  signal  which  is  combined  with  F  using  the  logical  'and'  function  to 
produce  the  binary  video. 

The  adjusted  minimum  level  is  set  manually  via  the  computer  to  give  an 
optimum  picture.  Originally  it  was  intended  to  monitor  the  noise  signal  and 
allow  the  computer  to  determine  the  appropriate  adjustment,  but  this  has  not 
been  implemented.  Because  the  noise  characteristics  of  a  particular  film  do 
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not  vary  significantly  from  frame  to  frame,  the  determination  of  the  shift  Is 
generally  required  only  once  per  film* 

In  order  to  minimise  the  effect  of  shading  across  the  film,  the  delay  time 
should  be  short.  It  was  chosen  to  be  approximately  the  same  as  the  spacing  of 
the  scale  lines  so  that  when  blnarising  the  scale  some  white  and  some  black,  are 
always  present.  However,  this  is  not  satisfactory  for  horizontal  white  lines 
such  as  the  top  of  a  main  digit  *7'  because  in  the  middle  the  delay  line  con¬ 
tains  only  white  signal  which,  as  described  above,  leads  to  a  black  output  from 
the  blnariser.  To  overcome  this,  the  computer  can  select  a  longer  delay  when 
analysing  main  digits.  This  spans  the  width  of  the  main  digits  and  fills  in  the 
horizontal  lines,  but  is  much  less  resistant  to  shading. 

Around  the  scale  area  is  a  wide  white  border  which,  under  the  system 
described  above  would  be  output  as  black  except  for  narrow  bands  at  the  edges 
of  the  scale  area.  To  ease  the  process  of  finding  the  scale  area  boundaries, 
it  is  desirable  that  the  border  be  white.  This  is  accomplished  by  an  additional 
function,  not  shown  in  Fig  5,  which  takes  advantage  of  the  fact  that  the  border 
is  much  whiter  than  the  scale  features.  The  white  level  defined  for  setting  up 
the  variable  density  filter  position  is  used  as  a  threshold  and  any  signal  out¬ 
side  the  control  area  which  exceeds  this  is  set  to  white. 

Fig  6  shows  the  binarlsed  version  of  Fig  2. 

3  THE  VIDEO  MEMORY,  PROJECTIONS  AND  THEIR  ANALYSIS 
3. 1  The  video  memory 

The  two  level  video  signal  is  sampled  at  a  rate  which  produces  1024 
samples  along  each  television  scan  line.  A  special  digital  memory  has  been 
built  to  receive  this  data.  In  it  can  be  stored  a  block  of  up  to  64  lines 
starting  anywhere  in  the  picture.  The  computer  specifies  the  first  and  last 
lines  required  and  then  instructs  the  memory  to  acquire  the  information.  The 
hardware  waits  until  a  new  scan  of  the  picture  starts  and  thus  reads  all  the 
data  lines  from  the  same  scan. 

The  process  of  data  acquisition  is  illustrated  in  Fig  7  which  shows  a  part 
of  the  video  memory  capable  of  storing  16  scan  lines.  The  complete  memory 
comprises  four  such  bands.  In  order  to  obtain  1024  samaples  along  each  scan 
line,  the  video  signal  is  sampled  at  19  MHz.  The  video  memory  is  not  capable 
of  accepting  the  single-bit  samples  at  this  rate,  so  they  are  first  formed  into 


8 


16-bit  words  by  a  serial-to-parallel  converter,  giving  a  word  rate  of  19/16  MHz. 
Fig  7  shows  the  positions  in  which  the  first  few  samples  from  the  first  three 
and  the  sixteenth  scan  lines  are  stored.  This  arrangement  is  necessary  to 
enable  the  data  to  be  read  in  at  the  required  rate,  but  is  inconvenient  for 
further  processing.  After  all  the  data  has  been  captured,  a  'reorientation' 
process  is  performed  which  re-writes  the  data  into  the  more  convenient  form 
shown  in  the  lower  part  of  the  Figure. 

It  is  possible  to  access  the  contents  of  the  video  memory  (both  to  read 
and  write)  from  the  computer.  This  facility  is,  however,  only  used  to  test 
memory  function.  For  scale  reading  the  data  goes  through  further  processing  by 
the  hardware  before  being  passed  to  the  computer. 

3.2  X  and  Y  projections 

Although  the  general  format  of  the  scale  area  is  constant,  the  positions 
of  the  various  features  are  not  fixed  and  must  be  determined  for  each  scale. 

This  applies  even  to  the  boundaries  of  the  scale  area  whose  positions  relative 
to  the  television  camera  can  move:  the  vertical  position  changes  from  film  to 
film,  and  the  horizontal  position  varies  as  the  film  is  carried  through  its 
holder  or  transport.  Feature  positions  are  determined  by  analysing  projections 
of  different  areas  of  the  picture.  This  section  defines  X  and  Y  projec¬ 
tions  and  describes  their  production  by  the  hardware. 

The  binary  picture  can  be  thought  of  as  a  matrix  of  elements  a(i,j) 
which  can  take  only  two  values,  0  or  l  .  Any  rectangular  area  in  the  pic¬ 
ture  is  a  submatrix  restricted  by  limits  such  as  r  <  i  <  s  and  t  <  j  <  u  . 

The  projections  of  this  submatrix  on  to  the  horizontal  (X)  and  vertical  (Y)  axes 
are  defined  to  be: 


and 


u 

PX(i)  =  l  a(i,j)  for  r  <  i  <  s 

j“t 

s 

P'f(j)  **  l  a(l,j)  for  t  <  j  <  u 

l=*r 


respectively 
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The  Y-projection  is  produced  on  a  line-by-line  basis  directly  from  the 
sampled  binary  video  signal*  The  computer  specifies  the  area  of  interest  and, 
for  each  required  line,  the  hardware  accumulates  the  number  of  ones  between  the 
first  and  last  elements  specified.  This  total  can  then  be  read  by  the  computer. 
This  must  be  done  before  the  next  line's  total  is  available:  the  hardware 
detects  failure  to  do  this  and  causes  the  computer  program  to  be  interrupted. 

The  X— projection  is  produced  from  data  in  the  video  memory  (and  the  area 
covered  is  therefore  a  maximum  of  64  lines  high).  The  function  is  an  integral 
part  of  the  video  memory  structure.  There  are  four  mask  registers  of  16  bits 
each  with  each  bit  corresponding  to  a  line  of  the  video  memory.  The  computer 
can  write  values  into  the  registers  with  a  '1'  in  each  position  corresponding  to 
a  line  to  be  included  in  the  projection.  The  computer  also  outputs  the  first 
column  number  of  interest.  When  the  X-projection  value  is  then  requested,  a 

logical  'and'  function  is  performed,  bit  by  bit,  on  the  mask  registers  and  the 

data  column,  and  the  resulting  accumulated  number  of  ones  is  returned  to  the 
program.  The  column  number  can  automatically  be  either  incremented  or  decre¬ 
mented,  if  required,  so  that  the  computer  can  simply  read  in  consecutive  projec¬ 
tion  totals. 

3.3  Projection  analysis  for  feature  position  detection 

The  computer  program  specifies  and  analyses  a  sequence  of  projections  in 
order  to  determine  the  positions  of  the  various  features  in  the  picture, 
including  the  position  of  the  cursor  along  the  scale.  Figs  8  to  11  illustrate 

this  process:  they  are  all  projections  of  parts  of  Fig  6.  Ref  3  describes 

another  example  of  the  use  of  projection  analysis. 

In  general,  where  boundaries  are  clear,  simple  threshold  crossing  is  used 
to  determine  theli  positions.  Where  the  boundaries  are  less  obvious,  for 
example  because  the  features  may  overlap  or  because  noise  may  be  significant, 
then  the  analysis  routines  search  for  local  maxima  and  minima  to  determine 
positions. 

First  the  boundaries  of  the  scale  area  are  required.  Fig  8  shows  the  X- 
projection  of  a  band  of  64  lines  across  the  whole  width  of  the  picture,  approxi¬ 
mately  in  the  centre.  The  positions  of  the  left  and  right  edges  are  obvious. 
These  edges  then  define  the  first  and  last  columns  required  for  the 


Y-projection  of  the  whole  height  of  the  picture,  shown  in  Fig  9.  The  top  of  the 
picture  is  at  the  left  of  the  Figure.  Again,  the  positions  of  the  top  and  bot¬ 
tom  edges  are  clear.  Between  the  top  and  bottom  edges,  the  most  prominent 
features  are  the  peaks  produced  by  the  scale  lines.  The  two  maxima  are  found, 
then  the  local  minima  which  define  the  top  and  bottom  of  the  upper  and  lower 
sets  of  lines  respectively.  The  gap  between  the  two  sets  of  lines  in  which  the 
cursor  appears  is  known  to  be  only  seven  lines  high.  Because  the  gap  is  so 
narrow,  its  projection  is  easily  distorted  by  noise,  or  by  slight  misalignment 
between  the  film  and  television  scan  horizontals,  so  its  position  is  taken  to  be 
that  of  the  seven  smallest  values  between  the  peaks. 

The  remaining  information  required  from  this  projection  is  the  position  of 
the  main  scale  digits.  They  can  be  in  one  of  two  bands,  the  upper  one  of  which 
is  considered  first.  The  total  white  in  the  band,  that  is,  the  sum  of  all  the 
projection  values,  is  compared  with  a  threshold  to  determine  whether  or  not 
digits  are  present.  In  the  current  example,  the  upper  main  digit  band  is  almost 
entirely  black,  so  the  program  proceeds  to  consider  the  lower  band.  Having 
decided  that  digits  are  present,  their  precise  boundaries  are  determined  by 
thresholding. 

Following  this  analysis  of  the  Y-projection,  the  area  containing  both  the 
top  set  of  scale  lines  and  the  cursor  is  read  into  the  video  memory.  Fig  10a 
shows  the  X-projection  of  the  cursor  band.  Apart  from  some  noise,  the  cursor  is 
the  only  feature  present  and  is  readily  detectable.  In  order  to  calculate  the 
scale  reading,  the  position  of  the  centre  of  the  cursor  must  be  estimated:  the 
po'  ion  which  most  closely  balances  the  sum  of  cursor  projection  values  to  its 
left  and  right  is  used. 

Fig  10b  shows  the  X-projection  of  the  upper  set  of  scale  lines.  The 
simplest  way  to  calculate  the  scale  reading  would  be  to  interpolate  the  cursor 
position  between  the  first  and  last  lines.  However,  the  error  in  the  scan 
linearity  of  the  television  camera  may  be  up  to  2 7.  of  the  picture  width  and 
could  cause  significant  error  in  this  calculation.  To  overcome  this,  the  scale 
lines  are  counted  (both  before  and  after  the  cursor,  as  a  check)  and  inter¬ 
polation  of  the  cursor  position  is  just  between  those  lines  closest  to  it. 

Finally,  the  band  containing  the  main  scale  digits  is  read  into  the  video 
memory.  From  its  X-projection,  shown  in  Fig  11,  the  horizontal  positioning  of 
the  digits  can  be  determined. 


11 


Thus,  at  this  stage,  the  fractional  part  of  the  scale  reading  has  been 
determined,  and  each  main  digit  is  stored  in  a  known  area  of  the  video  memory 
ready  to  be  passed  to  the  final  processing  stages. 

4  SCALING 

The  main  scale  digits  as  they  are  stored  in  the  video  memory  are  not 
immediately  suitable  as  input  for  character  recognition.  Firstly,  they  are  too 
big,  being  typically  about  65  columns  wide  by  about  40  lines  high.  (Resolution 
is  much  greater  horizontally  than  vertically.)  Secondly,  the  size  is  not 
constant  for  various  reasons  including  changes  in  optical  magnification  (at  both 
kinetheodolite  and  television  camera)  and  'noise'  in  the  binarlsing  process. 

Thus  it  is  necessary  to  scale  the  digits  down  to  a  suitable  size.  This  was  cho¬ 
sen  to  be  10  columns  by  20  rows  which  reflects  the  actual  aspect  ratio  of  the 
digits  on  the  film  and  gives  a  nominal  line  width  of  one  element. 

The  scaling  process  is  illustrated  in  Fig  12. 

If  the  digit  is,  say,  N  elements  wide,  it  can  be  divided  into  x  columns 
of  width  A  and  (10  -  x)  columns  of  width  (A  +  1)  where  A  is  the  integer  part 
of  N/ 10  ,  and  (10  -  x)  is  the  remainder.  For  example: 

N  =■  66  gives  A  =  6  and  x  =  '< 

So  66  elements  can  be  divided  into  four  columns  of  widen  o  and  six  columns 
of  width  7.  Similarly,  the  height  can  be  divided  into  20  '(.arts. 

From  its  knowledge  of  the  boundaries  of  the  digit,  the  software  calculates 
'A1  and  'x'  .  The  pattern  in  which  the  groups  of  size  A  and  A  +  1  are 
distributed  is  predetermined  for  each  possible  value  of  x  .  The  information  is 
stored  in  a  table  in  computer  memory  in  the  form  of  bit  patterns  where  'O' 
corresponds  to  group  size  A  and  ' 1 '  corresponds  to  group  size  A  +  1  .  The 
patterns  were  chosen  to  distribute  the  different  sized  groups  as  evenly  as 
possible. 

The  scaling  process  is  carried  out  by  special  hardware,  under  firmware 
control.  For  both  the  horizontal  and  vertical  directions,  the  software  outputs 
the  starting  point  in  the  video  memory,  the  pattern,  and  the  smaller  group  size 
(ie  'A').  The  grouping  process  described  above  divides  the  digit  into  rec¬ 
tangles  of,  in  any  particular  Instance,  four  possible  sizes,  as  shown  in  Fig  12. 
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If  S(i),  i  ■  1,2, 3, 4  are  the  rectangle  sizes,  the  software  calculates  the 
four  values  S(i)xN/100  (where  N  is  a  constant,  at  present  equal  to  50), 
rounded  to  the  nearest  integer,  and  sends  them  out  as  threshold  values.  The 
hardware,  using  the  patterns,  knows  the  size  of  each  rectangle.  It  counts  the 
number  of  ones  present  and  compares  the  total  with  the  given  threshold  for  the 
particular  rectangle  size  in  order  to  decide  whether  the  scale  digit  element 
should  be  'O'  or  *1'. 

The  10  x  20  scaled  digit  array  is  stored  in  a  special  memory  in  the  hard¬ 
ware.  The  data  is  available  to  the  computer  for  testing  purposes,  but  is  nor¬ 
mally  read  directly  by  the  character  recognition  hardware. 

5  CHARACTER  RECOGNITION  AND  MASK  GENERATION 

5 . 1  Character  recognition 

The  main  scale  digits  are  recognised  using  the  process  of  correlation  with 
weighted  masks.  The  principles  of  this  technique  are  illustrated  in  Fig  13,  and 
it  is  explained  briefly  below.  Ref  2  describes  this  and  other  character 
recognition  techniques:  this  one  was  chosen  because  of  the  possibility  of  auto¬ 
mating  the  design  of  the  masks.  Ref  4  describes  another  implementation  of  this 
technique . 

If  blnarised  and  scaled  samples  of  a  particular  numeral  were  always  iden¬ 
tical,  then  a  very  simple  mask  matching  technique  would  always  produce  correct 
recogniton:  a  sample  of  each  digit  would  be  stored,  and  each  unknown  would  be 
compared  with  all  the  samples.  In  the  ideal  situation  there  would  be  one  per¬ 
fect  match.  In  practice,  however,  'noise'  of  various  types  is  introduced  at 
each  of  the  processing  stages  involved  in  producing  the  binarlsed,  scaled  digit 
resulting  in  wide  variations  such  as  those  between  the  unknown  digit  and  the 
simple  '5'  mask  illustrated  in  Fig  13.  Simple  matching  in  this  case  results  in 
the  central  '5'  being  recognised  as  a  '6'.  There  are  only  three  small  areas  of 
difference  between  a  '5'  and  a  '6'  of  this  sort  of  design  and  these  are  indi¬ 
cated  by  '+'  and  '-'  symbols  in  Fig  13.  If  these  are  given  more  importance,  say 
by  giving  them  ten  times  the  'weight'  per  element,  then  as  shown  by  the  sums  of 
products,  S  ,  in  the  right-hand  part  of  Fig  13,  the  '5'  is  correctly  recognised. 

It  is  easy  to  design,  by  eye,  weighted  masks  to  distinguish  between  two 
digits.  Extending  this  principle  to  distinguish  between  10  digits  simulta¬ 
neously  is  much  more  difficult  and  requires  the  use  of  a  range  of  weight  values. 
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The  computer  program  which  was  written  to  generate  the  weighted  masks  is 
described  in  section  5.2. 

The  mask  values  are  stored  in  a  special  memory  in  the  hardware.  On  com¬ 
mand  from  the  computer,  the  contents  of  the  scaled  data  memory  are  passed  to  the 
character  recognition  hardware  which  calculates  the  10  correlation  values.  The 
arithmetic  for  the  10  masks  is  done  in  parallel  and  the  process  is  therefore  very 
fast.  The  totals  are  read  into  the  computer  and  normalised  (see  the  following 
section).  The  software  then  decides,  on  the  basis  of  their  absolute  and  rela¬ 
tive  values,  which  digit,  if  any,  is  contained  in  the  scaled  data  memory. 

Having  found  the  largest  total,  the  program  checks  that  it  is  both  greater  than  a 
certain  fixed  threshold  and  that  the  difference  between  it  and  the  second 
largest  total  is  greater  than  another  threshold  amount  before  accepting  that  the 
corresponding  digit  is  indeed  present.  Varying  these  thresholds  alters  the 
balance  between  the  number  of  misread  samples  and  the  number  of  samples  rejected 
as  unreadable. 

5.2  Automatic  mask  generation 

A  process  for  automatically  designing  weighted  character  recognition 
masks  was  developed  partly  because  of  the  complexity  of  simultaneously  differen¬ 
tiating  between  10  digits,  and  partly  in  anticipation  of  a  requirement  to 
recognise  other  digit  fonts. 

Several  samples  of  each  digit  were  collected  to  provide  design  data  for 
the  process.  For  each  digit,  the  samples  were  added  together  and  normalised  to 
give  a  composite  representation  with  values  in  the  range  -10  to  +10.  The  pro¬ 
cess  designs  one  weighted  mask  at  a  time.  Let  the  elements  of  this  be  m(i), 
the  elements  of  the  composite  digit  to  which  it  corresponds  be  a(i),  and  the 
elements  of  the  remaining  composite  digits  be  b(j,i),  where  1  <  i  <  200  and 
1  <  j  <  9  . 

Let  TA  be  the  correlation  total  of  the  mask  with  the  composite  digit 
for  which  it  is  being  designed,  ie 

200 

TA  -  l  a(i)m(i) 
i-1 
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and  let  TB(j)  be  the  correlation  total  of  the  mask  with  the  jth  composite 
digit,  le 


200 

TB( j)  -  l  b( j , i)m(i) 

1-1 

It  Is  required  that  TA  shall  be  significantly  greater  than  all  the  TB(j). 
This  will  be  so  if  the  ratios  TB(j)/TA  are  small,  and  the  design  task  can 
therefore  be  defined  as  choosing  the  m(i)  ,  within  a  fixed  range  of  values, 
such  that  the  nine  ratios  are  minimised  simultaneously. 

Initially,  the  m(i)  are  set  equal  to  the  a(i)  .  The  following  process 
is  then  repeatedly  performed. 

The  nine  correlation  ratios  are  calculated  and  the  maximum  is  determined. 
For  this  maximum,  a  digital,  approximate,  differentiation  is  used  to  find  the 
m(i)  to  which  the  ratio  is  most  sensitive,  and  whether  it  should  be  increased 
or  decreased  in  order  to  reduce  the  value  of  the  ratio.  The  m(i)  is  then 
changed  by  +1  or  -1  ,  as  appropriate,  provided  both  that  the  new  value  is 
still  within  the  allowed  range  and  that  this  change  would  not  increase  any  of 
the  other  eight  ratios.  If  the  most  sensitive  m(i)  cannot  be  changed,  then 
the  next  one  is  found,  and  so  on  until  a  change  is  made,  at  which  point  the 
whole  process  is  repeated. 

If  no  change  can  be  made  the  process  terminates.  In  practice,  before  this 
point  is  reached  all  the  ratios  become  less  than  zero,  and  this  also  terminates 
the  process.  The  final  value  of  TA  for  each  digit  is  used  to  normalise  the 
totals  produced  by  the  character  recognition  hardware,  as  described  previously. 

6  CONCLUSIONS 

This  Report  describes  briefly  several  image  processing  and  analysis  tech¬ 
niques  which  have  been  developed  for  a  specific  application  but  which  are  poten¬ 
tially  applicable  to  a  wide  variety  of  problems.  The  implementation  of  these 
techniques  using  the  combination  of  hardware  and  software  described  in  the 
Report  is  capable  of  reading  the  information  in  an  Askania  kinetheodolite  film 
scale  area  in  less  than  1  second. 
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Fig  8  X-projection  of  central  band  of  the  scale 


Fig  10b  X-projection  of  scale  lines 
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